Search Result

Select

Federated learning algorithm for communication cost optimization

ZHENG Sai, LI Tianrui, HUANG Wei

Journal of Computer Applications 2023, 43 (1): 1-7. DOI: 10.11772/j.issn.1001-9081.2021122054

Abstract （750）

HTML （49）

PDF （1156KB）（473）

Save

Federated Learning （FL） is a machine learning setting that can protect data privacy， however， the problems of high communication cost and client heterogeneity hinder the large?scale implementation of federated learning. To solve these two problems， a federated learning algorithm for communication cost optimization was proposed. First， the generative models from the clients were received and simulated data were generated by the server. Then， the simulated data were used by the server to train the global model and send it to the clients， and the final models were obtained by the clients through fine?tuning the global model. In the proposed algorithm only one round of communication between clients and the server was needed， and the fine?tuning of the client models was used to solve the problem of client heterogeneity. When the number of clients is 20， experiments were carried out on MNIST and CIFAR?10 dataset. The results show that the proposed algorithm can reduce the amount of communication data to 1/10 of that of Federated Averaging （FedAvg） algorithm on the MNIST dataset， and can reduce the amount of communication data to 1/100 of that of Federated Averaging （FedAvg） algorithm on the CIFAR-10 dataset with the premise of ensuring accuracy.

Reference | Related Articles | Metrics

Select

Early diagnosis and prediction of Parkinson's disease based on clustering medical text data

ZHANG Xiaobo, YANG Yan, LI Tianrui, LU Fan, PENG Lilan

Journal of Computer Applications 2020, 40 (10): 3088-3094. DOI: 10.11772/j.issn.1001-9081.2020030359

Abstract （413）

PDF （1270KB）（826）

Save

In view of the problem of the early intelligent diagnosis for Parkinson's Disease (PD) which occurs more common in the elderly, the clustering technologies based on medical detection text information data were proposed for the analysis and prediction of PD. Firstly, the original dataset was pre-processed to obtain effective feature information, and these features were respectively reduced to eight dimensional spaces with different dimensions by Principal Component Analysis (PCA) method. Then, five traditional classical clustering models and three different clustering ensemble methods were respectively used to cluster the data of eight dimensional spaces. Finally, four clustering performance indexes were selected to predict PD subject with dopamine deficiency as well as healthy control and Scans Without Evidence of Dopamine Deficiency (SWEDD) PD subject. The simulation results show that the clustering accuracy of Gaussian Mixture Model (GMM) reaches 89.12% when the value of PCA feature dimension is 30, the clustering accuracy of Spectral Clustering (SC) is 61.41% when the PCA feature dimension value is 70, and the clustering accuracy of Meta-CLustering Algorithm (MCLA) achieves 59.62% when the PCA feature dimension value is 80. The comparative experiments results show that GMM has the best clustering effect in the five classical clustering methods when the PCA feature dimension value is less than 40 and MCLA has the excellent clustering performance among the three clustering ensemble methods for different feature dimensions, which thereby provides the technical and theoretical supports for the early intelligent auxiliary diagnosis of PD.

Reference | Related Articles | Metrics

Select

Microoperation-based parameter auto-optimization method of Hadoop

LI Yunshu, TENG Fei, LI Tianrui

Journal of Computer Applications 2019, 39 (6): 1589-1594. DOI: 10.11772/j.issn.1001-9081.2018122592

Abstract （387）

PDF （931KB）（250）

Save

As a large-scale distributed data processing framework, Hadoop has been widely used in industry during the past few years. Currently manual parameter optimization and experience-based parameter optimization are ineffective due to complex running process and large parameter space. In order to solve this problem, a method and an analytical framework for Hadoop parameter auto-optimization were proposed. Firstly, the operation process of a job was broken down into several microoperations and the microoperations were determined from the angle of finer granularity directly affected by variable parameters, so that the relationship between parameters and the execution time of a single microoperation was able to be analyzed. Then, by reconstructing the job operation process based on microoperations, a model of the relationship between parameters and the execution time of whole job was established. Finally, various searching optimization algorithms were applied on this model to efficiently and quickly obtain the optimized system parameters. Experiments were conducted with two types of jobs, terasort and wordcount. The experimental results show that, compared with the default parameters condition, the proposed method reduce the job execution time by at least 41% and 30% respectively. The proposed method can effectively improve the job execution efficiency of Hadoop and shorten the job execution time.

Reference | Related Articles | Metrics

Select

Measure method and properties of weighted hypernetwork

LIU Shengjiu, LI Tianrui, YANG Zonglin, ZHU Jie

Journal of Computer Applications 2019, 39 (11): 3107-3113. DOI: 10.11772/j.issn.1001-9081.2019050806

Abstract （483）

PDF （913KB）（362）

Save

Hypernetwork is a kind of networks which is more complex than the ordinary complex network. Hypernetwork can describe complex system existing in the real world more appropriately than complex network since every hyperedge of it can connect any number of nodes. A new method to measure hypernetwork-Hypernetwork Dimension (HD) was proposed aiming to the shortcomings and deficiencies of existing measure method of hypernetwork. Hypernetwork dimension was expressed as twice as much as the ratio of the logarithm of the sum of all nodes' weights and product of corresponding hyperedge's weight in all hyperedges to the logarithm of the product of sum of hyperedges' weights and sum of nodes' weights. The hypernetwork dimension was able to be applied to the weighted hyperworks with many different numerical types of both nodes' weights and hyperedges' weights, such as positive real numbers, negative real numbers, pure imaginary numbers, and even complex numbers. Finally, several important properties of the proposed hypernetwork dimension were discussed.

Reference | Related Articles | Metrics

Select

Visualization of time series data based on spiral graph

YANG Huanhuan, LI Tianrui, CHEN Xindi

Journal of Computer Applications 2017, 37 (9): 2443-2448. DOI: 10.11772/j.issn.1001-9081.2017.09.2443

Abstract （721）

PDF （914KB）（671）

Save

Phased time series data is common in daily life. It describes an event that contains a number of state transitions. Each state has a time attribute, and there are multiple paths between state transitions. Aiming at the problem that the existing visualization techniques are not sufficient in visualizing the transition of each phase or the time variation of paths between states, a novel visualization model based on spiral graph was proposed. In the proposed model, each state was represented by a circle and the states of an event were represented by a set of concentric circles, and the reachable paths between neighboring states were represented by spirals. The start point of each spiral depended on its start time and the start states, and the end point of each spiral depended on its end time and the end states. To solve the overlapping problem caused by large amount of paths, the transparency adjustment algorithm based on long-tailed function was applied on the paths. The transparency of each path was assigned according to the number of intersections of this path and other paths. Flexible interactive facilities such as path filtering, highlighting, bomb box and zooming were provided to support efficient data exploration. The proposed model was implemented on China railway data, the experimental result shows that the model can effectively display trains of any running duration in limited space and is able to reduce the chaos caused by paths overlapping when confronted with large amount of trains as well as keep the information of trains and provide decision support for the user route choice, which validates the effectiveness of the proposed model in visualizing phased time series data.

Reference | Related Articles | Metrics

Select

Entity alignment of Chinese heterogeneous encyclopedia knowledge base

HUANG Junfu, LI Tianrui, JIA Zhen, JING Yunge, ZHANG Tao

Journal of Computer Applications 2016, 36 (7): 1881-1886. DOI: 10.11772/j.issn.1001-9081.2016.07.1881

Abstract （936）

PDF （1027KB）（612）

Save

Aiming at the problem that the traditional entity alignment algorithm may lead to bad performance in entity alignment task of Chinese heterogeneous encyclopedia knowledge base, an entity alignment method based on entity attributes and the features of context topics was proposed. First, a Chinese heterogeneous encyclopedia knowledge base was constructed based on Baidu encyclopedia and Hudong encyclopedia data. Next, the Resource Description Framework Schema (RDFS) vocabulary list was made to normalize the entity attributes. Then the entity context information was extracted and the Chinese word segmentation was used on the contexts. The contexts were modelled by using the topic model and the parameters were computed by Gibbs sampling method. After that the topic-word probability matrix, the characteristic word collection and the corresponding feature matrix were calculated. Last, the Longest Common Subsequence (LCS) algorithm was used to compute the entity attribute similarity. When the similarity was between the lower and the upper bounds, the topic features of the entities' context were combined to resolve the entity alignment problem. Finally, according to the standard method, an entity alignment data set of Chinese heterogeneous encyclopedia was constructed for simulation experiments. In comparison with the traditional property similarity algorithm, weighted-property algorithm, context term frequency feature model and topic model algorithm, the experimental results show that the proposed method achieves 97.8% accuracy, 88.0% recall, 92.6% F-score in people class and 98.6% accuracy, 73.0% recall, 83.9% F-score in movie class. It outperformed the other entity alignment algorithms. The experimental results also indicate that the proposed method can improve the entity alignment results in constructing the Chinese heterogeneous encyclopedia knowledge base, and it can be applied to the entity alignment tasks with context information.

Reference | Related Articles | Metrics

Select

Fault diagnosis method of high-speed rail based on compute unified device architecture

CHEN Zhi, LI Tianrui, LI Ming, YANG Yan

Journal of Computer Applications 2015, 35 (10): 2819-2823. DOI: 10.11772/j.issn.1001-9081.2015.10.2819

Abstract （409）

PDF （703KB）（407）

Save

Concerning the problem that traditional fault diagnosis of High-Speed Rail (HSR) vibration signal is slow and cannot meet the actual requirement of real-time processing, an accelerated fault diagnosis method for HSR vibration signal was proposed based on Compute Unified Device Architecture (CUDA). First, the data of HSR was processed by Empirical Mode Decomposition (EMD) based on CUDA, then the fuzzy entropy of each result component was calculated. Finally, K-Nearest Neighbor (KNN) classification algorithm was used to classify feature space which consisted of multiple fuzzy entropy features. The experimental results show that the proposed method is efficient on fault classification of HSR vibration signal, and the processing speed is significantly improved compared with the traditional method.

Reference | Related Articles | Metrics

Select

Ant colony optimization algorithm based on Spark

WANG Zhaoyuan, WANG Hongjie, XING Huanlai, LI Tianrui

Journal of Computer Applications 2015, 35 (10): 2777-2780. DOI: 10.11772/j.issn.1001-9081.2015.10.2777

Abstract （933）

PDF （721KB）（604）

Save

To deal with the combinatorial optimization problem in the era of big data, a parallel Ant Colony Optimization (ACO) algorithm based on Spark, a framework for the distributed memory computing, was presented. To achieve the parallelization of the phase of solution construction in ant colony optimization, a class of ants was encapsulated to a resilient distributed dataset and the corresponding transformation operators were given. The simulation results in solving the Traveling Salesman Problem (TSP) prove the feasibility of the proposed parallel algorithm. Under the same experimental environment, the comparison results between MapReduce based ant colony algorithm and the proposed algorithm show that the proposed algorithm significantly improves the optimization speed at least ten times than the MapReduce one.

Reference | Related Articles | Metrics

Select

Medium access control protocol with network utility maximization and collision avoidance for wireless sensor networks

LIU Tao LI Tianrui YIN Feng ZHANG Nan

Journal of Computer Applications 2014, 34 (11): 3196-3200. DOI: 10.11772/j.issn.1001-9081.2014.11.3196

Abstract （205）

PDF （756KB）（497）

Save

In order to avoid transmission collisions and improve energy efficiency for periodic report Wireless Sensor Network (WSN), a Medium Access Control (MAC) protocol with network utility maximization and collision avoidance called UM-MAC was proposed. UM-MAC used Time Division Multiple Access (TDMA) scheduling mechanism and introduced the utility model into the slot assignment process. A utility maximization problem of joint link reliability and energy consumption optimization based on utility model was put forward. To handle it, a heuristic algorithm was proposed to make the network to quickly find out a slot scheduling strategy which maximize network utility and avoid transmission collisions. Comparison experiments among UM-MAC, S-MAC and CA(Collision Avoidance)-MAC protocols were conducted under networks with different nodes, where UM-MAC got larger network utility and higher average packet successful delivery ratio, the lifetime of UM-MAC was between S-MAC and CA-MAC, while its average transmission delay increased under networks with defferent loads. The simulation results show that UM-MAC can achieve collision avoidance and improve network performance in terms of packet successful delivery ratio and energy efficiency; meanwhile, the TDMA-based protocol is not better than competition-based protocol in low load networks.

Reference | Related Articles | Metrics